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Abstract: We investigated the complete mitochondrial genome (mitogenome) of Argyreus hyperbius. The 15156 bp 
long genome harbored the gene content (13 protein coding genes, 22 tRNA genes, 2 rRNA genes and an A+T-rich region) 
and the gene arrangement was identical to all known lepidopteran mitogenomes. Mitogenome sequence nucleotide 
organization and codon usage analyses showed that the genome had a strong A+T bias, accounting for A+T content of 
80.8%, with a small negative AT skew (—0.019). Eleven intergenic spacers totaling 96 bp, and 14 overlapping regions 
totaling 34 bp were scattered throughout the whole genome. As has been observed in other lepidopteran species, 12 of the 
13 protein-coding genes (PCGs) were initiated by ATN codons, while the COI gene was tentatively designated by the 
CGA codon. A total of 11 PCGs harbored the complete termination codon TAA, while the COI and COII genes ended at a 
single T residue. All of the 22 tRNA genes showed typical clover structures except that the tRNA?" ^99 Jacks the 
dihydrouridine (DHU) stem which is replaced by a simple loop. The intergenic spacer sequence between the tRNA S*ASN) 
and NDI also contained the ATACTAA motif, which is conserved in all other lepidopterans as well. Additionally, the 349 
bp A+T-rich region was not comprised of large tandem repetitive sequences, but harbored a few structures common to 
other lepidopteran insects, such as the motif ATAGA followed by a 20 bp poly-T stretch, a microsatellite-like (AT)s 
element preceded by the ATTTA motif, and a 5 bp poly-A site present immediately upstream of tRNA. The 
mitochondrial genomic sequence features found in this study not only contribute to genetic diversity information of the 
group, but also are useful in future studies of the endangered nymphalid butterfly in population genetic dynamics, species 
conservation, phylogeography and evolution. 


Key words: Argyreus hyperbius; Nymphalidae; Lepidoptera; Mitochondrial genome 


SES E ME 2x Ti RS LB E Fe 91] 85 JUI 2E B AT 


GEL! MERGER, qd EA AL Sg) d 4 BRO ARR 


(1. BUNA "EG Flee Biot T 3B EVI EUS, K FEI 241000; 
TE BERI GO AE eT MAR EWF ESE SS, YLUR Pd pÚ 210008) 

















2: rH 


H 














T 
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The laced fritillary, Argyreus hyperbius Linnaeus, is 
an oriental nymphalid butterfly species distributed in 
areas of south-east Asia, India, and north-east Africa. In 
recent decades, mainly owing to habitat destruction, 
numerous local populations have shown a sharp decline, 
and thus this species is considered endangered in some 
countries including China. Known as the “flying flower’, 
A. hyperbius was once wide-spread but is now rarely 
found in any large cities, such as Nanjing (Wu, 2008). To 
date, however, this once widely distributed species has 
received little attention. Detailed research focusing on 
aspects such as population genetic divergence, 
phylogeography and other relevant areas are required; 
thus, our study was conducted to assist in the protection 
and better understanding of this butterfly species. 

Animal mitochondrial genomes are generally a 
circular molecule, ranging from 15-20 kb in size, and 
with a few exceptions, they all encode 37 genes: 13 
protein-coding genes (PCGs), 2 ribosomal RNA genes 
(IrRNA and srRNA), and 22 transfer RNA genes and 
non-coding control elements regulating the transcription 


and replication of the mitochondrial genome (Taanman, 


1999). Maternally inherited mtDNA is simple and stable 
in structure. These genes are predominantly encoded on 
both strands and are compactly arranged, with coding 
segments separated by none or only very short (a few 
base pairs) non-coding spacers, and in rare cases, a few 
genes overlap. Therefore, mitochondrial genes or 
genomes have been used as potential tools in studies of 
phylogenetics, phylogeography, phylogenetic chronology, 
and molecular diagnostics (Nardi et al, 2005; Simonsen 
2006) aid of PCR 
methodologies (Kocher et al, 1989; Yamauchi et al, 
2004). 

Within the butterflies 
(Rhopalocera) account for nearly 16000 species, and its 


et al, especially with the 


Lepidoptera order, the 


largest subgroup (Nymphalidae) contain approximately 
5000 species (DeVries, 2001). Despite this large 
taxonomic diversity, information about the nymphalid 
butterfly mitogenome is still limited, and to the best of 
our knowledge, only a few complete or nearly complete 
mitogenomes of nymphalid species are currently 
available on GenBank (Tab.1). Thus, newly added 
mitogenome sequences of nymphalid species can provide 


Tab. 1 Mitochondrial genomes employed in this study 








Family Subfamily Species GenBank Acc. No. Reference 
Papilionidae Papilioninae Papilio maraho NC 014055 From GenBank 
Papilionidae Papilioninae Teinopalpus aureus NC 014398 From GenBank 
Papilionidae Papilioninae Papilio xuthus EF621724 Feng et al (2010) 
Papilionidae Parnassiinae Parnassius bremeri FJ871125 Kim et al (2009) 

Pieridae Pierinae Pieris melete NC 010568 Hong et al (2009) 

Pieridae Pierinae Pieris rapae HM156697* Mao et al (2010) 
Nymphalidae Heliconiinae Acraea issoria NC 013604 Hu et al (2010) 
Nymphalidae Argynninae Argyreus hyperbius JF439070° This study 
Nymphalidae Apaturinae Sasakia charonda NC_014224 Unpublished 
Nymphalidae Calinaginae Calinaga davidis HQ658143* Xia et al (2011) 
Nymphalidae Satyrinae Melanitis leda JF905446" Unpublished 
Nymphalidae Satyrinae Eumenis autonoe GQ868707 Kim et al (2010) 
Nymphalidae Danainae Euploea mulciber HQ378507 Unpublished 
Nymphalidae Libytheinae Libythea celtis HQ378508 Unpublished 
Lycaenidae Theclinae Coreana raphaelis DQ102703 Kim et al (2006) 





` Unreleased mitochondrial genomes determined by our laboratory. 
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further insights into their diversity and evolution. In this 
study, we sequenced the entire mitogenome of the 
nymphalid butterfly Argyreus hyperbius and analyzed its 
nucleotide organization and major characteristics compared 
with those of other butterfly species to increase of 
understanding of mitogenomes and phylogenies of 
correlative butterflies. 


1 Materials and Methods 


1.1 Sample and DNA extraction 

Adult A. hyperbius individuals were collected on 
Huangshan Mountain in Anhui Province, China, on 
2006 (specimen voucher ZWX09). After 
collection, the fresh materials were preserved in 100% 


August 


ethanol immediately and stored in a —20 ?C refrigerator 
before genomic DNA extraction. 

Whole genomic DNA was extracted and purified by 
the modified glass powder method, whereby rice-sharp 
thorax muscle taken and put into one 10 mL Eppendorf 
tube, washed twice with ddH5O, soaked for about 2-3 h, 
and then incubated with 500 uL DNA liquid (5 mmol/L 
of NaCl, 0.5% SDS, 15 mmol/L of EDTA, 10 mmol/L of 
Tris-HCl, pH 7.6) and 40 uL of Proteinase-K (20 mg/ml), 
After this, the muscle was bathed at 55 °C for 10-12 h 
and centrifuged at 4 000 rpm for 2 min. The liquid 
supernatant was diverted into a new 10 mL Eppendorf 
tube, to which 500 uL of 8 mol/L GuSCN and 40 uL of 
50% clean glass liquid mixture was added and the 
solution was then bathed at 37 'C for 1—2 h, rocked for 1 
h, and centrifuged at 4 000 r/min for 1 min. The 
supernatant was then removed and the sediments were 
twice cleaned with 75% alcohol and once with acetone, 
and dried thoroughly in a vacuum dryer at 45 ?C. Then 
60 uL of TE (10 mmol/L Tris-Cl, 1 mmol/L EDTA, pH 
8.0) was added into the Eppendorf tube with powder, and 
bathed at 56 °C for 30 min, then finally speed up slowly 
till 4 000 r/min and centrifuged for 1 min. The 
supernatant containing the genomic DNA was then 
transferred into a clean 1.5 mL Eppendorf tube and 
preserved at —20 °C till use (Hao et al, 2007). 

1.2 PCR amplification and sequencing 

Some universal primers for short fragment 
amplifications of 12S rRNA, COI, Cyt b genes were used 
for PCR (Simon et al, 1994; Simons & Weller, 2001). 
Long primers and some short ones including COIII and 
NDS were designed by the multiple sequence alignments 
of all the available complete lepidopteran mitochondrial 
genomes (Tab. 1) using ClustalX1.8 (Thompson et al, 


1997) and Primer Premier 5.0 software (Singh et al, 
1998). 

Long PCRs were performed using TaKaRa LA Taq 
polymerase with the following cycling parameters: initial 
denaturation for 5 min at 95 °C, followed by 30 cycles at 
95 °C for 50 s, 50 °C for 50 s, 68 °C for 2 min and 30 s; 
and a final extension step of 68 °C for 10 min. Short 
fragments were amplified with TaKaRa Taq polymerase: 
initial denaturation for 5 min at 94 °C, followed by 35 
cycles at 94 °C for 1 min, 45—53 °C for 1 min, 72 °C for 
2 min, and a final extension step of 72 °C for 10 min. 
The PCR products were detected via electrophoresis in 
1.2% agarose gel, purified using the 3S Spin PCR 
Product Purification Kit and sequenced directly with an 
ABI-3730 automatic DNA sequencer. 

1.3 Sequence analysis 

The determined sequences were checked firstly with 
the NCBI Internet BLAST search function. Raw 
sequence files were proof read and assembled in BioEdit 
version 7.0 (Hall, 1999) as well as ClustalX 1.8 
(Thompson et al, 1997). Transfer RNA gene analysis 
was conducted using tRNAscan-SE software v.1.21 
(Lowe & Eddy, 1997). Putative tRNA genes not found 
by tRNAscan-SE were confirmed by sequence comparison 
between A. hyperbius and other lepidopterans. Both 
PCGs and ribosomal RNA genes were identified by 
ClustalX1.8 software, and the PCGs 
sequences were translated on the basis of the Invertebrate 


nucleotide 


Mitochondrial Genetic Code. Nucleotide composition 
skewness (AT skew=(A-T)/(A+T), GC 
skew=(G—C)/(G+C) (Irwin et al, 1991)) and codon usage 
were calculated in MEGA 4.0 software (Kumar et al, 
2004). The A. hyperbius mitogenome sequence data were 
deposited into the GenBank database under the accession 
number JF439070. 


2 Results 


2.1 Genome organization 

The mitogenome of A. hyperbius was 15 156 bp in 
length (Fig. 1) and encoded 37 genes totally 13 PCGs 
(ATP6, ATP8, COI-III, ND1-6, ND4L, Cyt b), 2 
ribosomal RNA genes for small and large subunits 
(srRNA and IrRNA), and 22 transfer RNA genes) and a 
non-coding A+T-rich region (the control region) (Tab. 2). 
Among these, 14 genes were encoded on the N strand, 
including 4 PCGs (ND1, ND4, ND4L, NDS), 2 
ribosomal RNA genes for small and large subunits, and 8 
transfer RNA genes (tRNA™, tRNA“, tRNA™, 
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Argyreus hyperbius 


Mitochondrial Genome 


15 156 bp 





Fig. 1 Circular map of the Argyreus hyperbius 
mitochondrial genome 


tRNA PS tRNA tRNA” tRNA CUN tRNA"), The 
remaining 22 genes and A+T-rich region were encoded 
on the J strand. Eleven intergenic spacers totaling 96 bp, 
and 14 overlapped regions totaling 34 bp were scattered 
throughout the whole genome. 

2.2 PCGs, tRNA and rRNA genes, A+T-rich region 

Twelve of the 13 PCGs were initiated by ATN 
codons, while the COI gene was tentatively designated 
by the CGA codon; eleven PCGs harbored the complete 
termination codon TAA, while the COI and COII genes 
ended at a single T residue. 

Results showed A. hyperbius harbored the typical 
set of 22 tRNA genes ranging from 61 to 71 bp in size. 
All the predicted secondary structures of the A. hyperbius 
tRNAs are shown in Fig. 2. Some 22 tRNA genes 
showed typical clover structures except that the 
tRNA (GN Jacked the dihydrouridine (DHU) stem, 
which was replaced by a simple loop. Seventeen tRNA 
genes has a total of 26 pair mismatches in their stems, 
among which, seven were in the DHU stems, nine in the 
amino acid acceptor stems, one in the TC stem, and 
nine in the anticodon stems, respectively. 

Based on the mitogenomes of the other insects, two 
rRNA genes (IrRNA and srRNA) were present in A. 
hyperbius. The 1 330 bp IrRNA and 778 bp srRNA were 
located between tRNA'"(CUN) and tRNA V and 
between tRNA Y! and the A+T-rich region, respectively. 

The 349 bp A+T-rich region was not comprised of 
large tandem repetitive sequences, but harbored a few 
structures common to other lepidopteran insects, such as 
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motif ATAGA followed by a 20 bp poly-T stretch, a 
microsatellite-like (AT)s element preceded by the 
ATTTA motif, and a 5 bp poly-A site present immediately 
upstream of tRNA Met. 
2.3 Sequence variation and codon usage 

The A+T content of the A. hyperbius was 80.8%, 
and the whole mitogenome showed obvious A+T bias 
(Tab. 3). The relative synonymous codon usage (RSCU) 
in the A. Ayperbius mitochondrial PCGs was investigated 
and the results are summarized in Tab. 4. The four most 
frequently used codons were TTA (leucine, Leu), ATT 
(isoleucine, Ile), TTT (phenylalanine, Phe), and ATA 
(methionine, Met), accounting for 40.4% of all the 
codons in the A. hyperbius mitogenome. These four 
codons were composed of A or T nucleotides, indicating 
their biased usage. The total number of non-stop codons 
(CDs) of the A. hyperbius mitochondrial PCGs was 3 718. 
Among these amino acid codons, the Leu (14.20%), Ile 
(12.80%), Phe (10.27%), and Ser (8.50%) were the most 
frequently used. 


3 Discussion 


3.1 Genome organization 

The size of the mitogenome was congruent with the 
sizes of other known lepidopteran mitogenomes, ranging 
from 15122 bp in Melanitis leda (unpublished, GenBank 
accession number JF905446) to 16 094 bp in Papilio 
maraho (unpublished, NC 014055). The gene content of 
the A. hyperbius mitogenome was the same as the typical 
animal mitogenome, and the gene order and orientation 
were identical to the already determined lepidopteran 
mitogenomes. Compared with other lepidopterans, 
however, the A. hyperbius mitogenome was relatively 
more compacted, with a total of only 96 bp intergenic 
spacers ranging from 2—52 bp in length. Additionally, a 
total of 34 bp overlapped regions were scattered 
throughout the whole genome. Its tRNA cluster existing 
ahead of NADH dehydrogenase subunit 2 (ND2) was 
arranged in M-I-Q order, which means the tRNAV*(M) 
was followed by tRNA'*(I) and tRNA?"(Q), which was 
similar to lycaenid Coreana raphaelis (Kim et al, 2006) 
and the noctuid Ochrogaster lunifer (Salvato et al, 2008). 
As far as we know, all determined lepidopteran genomes, 
including that of A. hyperbius, share the same order of 
gene arrangement but differ from that of hypothesized 
ancestral insects. This confirms the suggestion proposed 
by Boore et al (1998) that the Lepidoptera may have 
diverged from other insect orders for a certain period of 
time, forming an independent evolutionary lineage. 
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Tab. 2 Organization of the Argyreus hyperbius mitochondrial genome 





Gene Nucleotide* No. Size Anticodon IGNe? Start code Stop code 
tRNAM* 1-68 68 32-34 CAT -2 
tRNA! 67-131 65 97-99 GAT -3 
tRNASI 129-197 69 165-167 TTG 52 
ND2 250-1 263 1014 -2 ATT TAA 
tRNA TP 1 262-1 330 69 1 294-1 296 TCA -8 
tRNA 1323-1384 62 1353-1355 GCA -1 
tRNA r 1 384-1 448 65 1415-1417 GTA 3 
COI 1452-2 982 1531 0 CGA T-tRNA'" 
tRNALPUUR) 2 983-3 049 67 3 013-3 015 TAA 0 
COII 3 050-3 725 676 0 ATG T-tRNAP* 
tRNA 3 726-3 796 71 3 756-3 758 CTT -1 
tRNA“? 3 796-3 861 66 3 826-3 828 GTC 0 
ATP8 3 862-4 023 162 -7 ATT TAA 
ATP6 4 017-4 694 678 -1 ATG TAA 
COIII 4 694—5 482 789 2 ATG TAA 
tRNA?9» 5 485-5 549 65 5 515-5 517 TCC 0 
ND3 5 550-5 903 354 8 ATT TAA 
tRNA“! 5912-5 977 66 5 943—5 945 TGC 2 
tRNA 5 980—6 043 64 6 006—6 008 TCG 0 
tRNA“™ 6 044-6 109 66 6 075-6 077 GTT -2 
tRNA S*ASN) 6 108-6 169 61 6 129-6 131 GCT 0 
tRNAS™ 6 170-6 234 65 6 200-6 202 TTC 3 
tRNAM 6 238-6 302 67 6 268-6 270 GAA -1 
ND5 6 302-8 023 1722 15 ATA TAA 
tRNA! 8 039—8 105 67 8073-8 075 GTG -1 
ND4 8 105-9 445 1341 2 ATG TAA 
ND4L 9 448-9 732 285 2 ATG TAA 
tRNA™ 9 735-9 798 64 9 765-9 767 TGT 0 
tRNA’? 9 799-9 865 67 9 831-9 833 TGG 4 
ND6 9 870-10 397 528 -1 ATA TAA 
Cytb 10 397-11 551 1155 -2 ATG TAA 
tRNAS™UCN) 11 550-11 613 64 11 579-11 581 TGA -2 
NDI 11 612-12 565 954 3 ATA TAA 
tRNATUCUN) 12 569-12 634 66 12 603-12 605 TAG 0 
IrRNA(16S) 12 635-13 964 1330 0 
tRNA V?! 13 965-14 029 65 0 
srRNA(12S) 14 030-14 807 778 0 
D-loop 14 808-15 156 349 


* Underlines denote that the genes encoded on the N strand. 


^ [GN: intergenic nucleotides; negative numbers indicate overlapping nucleotides between adjacent genes. 
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Fig. 2 Predicted secondary clover-leaf structure of 


the Argyreus hyperbius 22 tRNA genes 
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Tab. 3 Nucleotide composition and skewness in different regions of the Argyreus hyperbius mitogenome 








Size Nucleotide composition (?6) 
(bp) T C A G AT 
All gene 15156 41.4 11.7 39.4 T3 80.8 
Genes on J-strand 6 887 44.2 12.7 33.7 9.4 77.9 
Genes on N-strand 4302 47.7 6.1 34.3 11.9 82.0 
First codon positions 3718 39.6 9.9 37.6 15.5 74.5 
Second codon positions 3718 48.3 16.1 22:5 13.1 70.8 
Third codon positions 3718 51.4 4.5 41.5 2.6 92.9 
rRNA 2108 45.4 10.1 39.4 572 84.7 
Control region 349 50.1 3.7 45.3 0.9 95.4 


Tab. 4 The codon number and RSCU in the Argyreus hyperbius mitochondrial PCGs 








Codon Num. RSCU Codon Num. RSCU Codon Num. RSCU Codon Num. RSCU 
UUU(F) 357 1.87 UCU(S) 113 2.86 UAU(Y) 183 1.80 UGU(C) 31 1.94 
UUC(F) 25 0.13 UCC(S) 12 0.30 UAC(Y) 20 0.20 UGC(C) 1 0.06 
UUA(L) 439 4.99 UCA(S) 76 1.92 UAA(*) 0 0.00 UGA(W) 9] 1.96 
UUG(L) 21 0.24 UCG(S) 2 0.05 UAG(*) 0 0.00 UGG(W) 2 0.04 
CUU(L) 42 0.48 CCU(P) 65 2.20 CAU(H) 60 1.74 CGU(R) 16 1.21 
CUC(L) 4 0.05 CCC(P) 14 0.47 CAC(H) 9 0.26 CGC(R) 3 0.23 
CUA(L) 22 0.25 CCA(P) 37 1.25 CAA(Q) 60 1.97 CGA(R) 30 2.26 
CUG(L) 0 0.00 CCG(P) 2 0.07 CAG(Q) 1 0.03 CGG(R) 4 0.30 
AUU(D 440 1.85 ACU(T) 95 2.47 AAU(N) 238 1.85 AGU(S) 35 0.89 
AUC(I) 36 0.15 ACC(T) 7 0.18 AAC(N) 19 0.15 AGC(S) 3 0.08 
AUA(M) 266 1.83 ACA(T) 51 1.32 AAA(K) 97 1.81 AGA(S) 74 1.87 
AUG(M) 25 0.17 ACG(T) 1 0.03 AAG(K) 10 0.19 AGG(S) 1 0.03 
GUU(V) 60 2.02 GCU(A) 68 2.23 GAU(D) 62 1.85 GGU(G) 46 0.93 
GUC(V) 1 0.03 GCC(A) 10 0.33 GAC(D) 3 0.15 GGC(G) 0 0.00 
GUA(V) 53 1.78 GCA(A) 42 1.38 GAA(E) 66 1.81 GGA(G) 139 2.82 
GUG(V) 5 0.17 GCG(A) 2 0.07 GAG(E) T 0.19 GGG(G) 12 0.24 


3.2 Protein-coding genes 

All protein-coding sequences except COI gene use 
standard ATN start codon in A. hyperbius (Tab. 2). Three 
PCGs (ND5, NDI and ND6) were initiated by ATA 
(Met); six PCGs (COH, ATP6, CONI, ND4, ND4L and 
Cyt b) were initiated by ATG (Met), and three PCGs 
(ND2, ATP8 and ND3) were initiated by ATT (lle), 
respectively. However, the COI gene generally uses non- 
canonical initial codons across different insect groups. 
The use of non-canonical initial codons for the COI gene 
has been reported in a number of other insect species. 
For example, Junqueira et al (2004) and Friedrich & 
Muqim (2003) proposed AAA or TCG as the initial site 
for COI in dipteran Chrysomya chloropyga and in 
coleopteran Tribolium castanaeum, respectively. Other 
studies have determined that TTG is the initiation codon 
for COI in some invertebrates such as Anopheles 
quadrimaculatus (Mitchell et al, 1993), Pyrocoelia rufa 
(Bae et al, 2004), Caligula boisdnvalii (Hong et al, 2008) 
and Acraea issoria (Hu et al, 2010). In addition, the 
tetranucleotide TTAG in Coreana raphaelis (Kim et al, 
2006), the hexanucleotide TATTAG in Ostrinia nubilalis 
and Ostrinia furnicalis (Coates et al, 2005), TTTTAG in 


Bombyx mori (Yukuhiro et al, 2002), ATTACG in 
Papilio xuthus (Feng et al, 2010), and TTAAAG in 
Pieris rapae (Mao et al, 2010) have also been proposed 
as the COI start codon. In the case of A. hyperbius, we 
tentatively presumed CGA as the start codon for COI, 
which was congruent with Parnassius bremeri (Kim et al, 
2009), Eumenis autonoe (Kim et al, 2010), and 
Hyphantria cunea (Liao et al, 2010). Besides ATN, GTN 
has also been reported in Heterocera as the initiation 
codon for some PCGs. For instance, GTG has been 
reported as the start codon for COII in Caligula 
boisduvalii (Hong et al, 2008) and Eriogyna pyretorum 
(Jiang et al, 2009), and for ND1 in Ochrogaster lunifer 
(Salvato et al, 2008). Furthermore, ND4 and ND4L in 
Ochrogaster lunifer use GTT as their initiation codon. 
Eleven of the 13 protein-coding genes had the 
common stop codon (TAA), while COI and COII 
terminated with a single T residue in the A. Ayperbius 
mitogenome. Similar cases have been found in most 
insect mitogenomes including all known lepidopteran 
mitogenomes. For example, a single T residue has been 
deemed the stop codon for COI, COII, ND5 and Cyt b, 
and a dinucleotide residue TA has been deemed the stop 
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codon for ATP6, ND4, ND4L, ND6 in Coreana 
raphaelis (Kim et al, 2006); similarly, a single T has 
been considered the stop codon for COI, COI and ND4, 
while TA residue is considered the stop codon for ATP6 
in Hyphantria cunea (Liao et al, 2010). Incomplete stop 
codons produce functional stop codons in polycistronic 
transcription cleavage and polyadenylation processes 
(Ojala et al, 1981). 

Three of the 13 PCGs (ATP8, ATP6, ND6) in A. 
hyperbius were flanked by other PCGs at the 3' end: 
ATP8-ATP6, ATP6-COIIL and ND6-Cyt b were 
overlapped by seven (ATGATAA), one (A) and one (A) 
nucleotide, respectively. The 3' end region of these three 
genes had the potential to form hairpin-like structures, 
which are crucial for precise mRNA cleavage to generate 
mature PCGs (Kim et al, 2006; Fenn et al, 2007). 


Those genes encoded by the N strand are underlined. 


The tRNA genes are designated by single letter amino 
acid codes. L* and S* denote the tRNA'^"U9 and 
tRNA SUC respectively. 
3.3 Transfer RNA and ribosomal RNA genes 

All the tRNA genes showed typical clover structure, 
with the exception of the fRNA “(AG gene which lacks 
the dihydrouridine (DHU) stem and was replaced by a 
simple loop. This phenomenon has also been detected in 
other insect groups (Wolstenholme, 1992) including 
lepidopterans (Hong et al, 2008; Kim et al, 2006; Salvato 
et al, 2008; Liao et al, 2010). Seventeen tRNA genes had 
a total of 26 pair mismatches in their stems, among 
which eighteen G-U, seven U-U, and one A-C were 
present. These mismatches found in tRNAs can be 
corrected through RNA-editing mechanisms (Lavrov et 
al, 2000). To date, however, these modifications in insect 
tRNA genes are not well understood in light of their 
mechanism, although some researchers propose there to 
be a connection with rapid species evolution of insects 
(Takashi et al, 1991; Watanabe & Watanabe , 1994). 

Two rRNA genes were in the observed size range of 
known lepidopteran mitogenomes. For example, the 1 
330 bp IrRNA was well within the range of other known 
lepidopterans (from 1319 bp in A. melete (Hong et al, 
2009) to 1426 bp in H. cunea (Liao et al, 2010)). The 
case was similar with srRNA, in which size was also 
within the observed size range of other lepidopteran 
insects (from 434 bp in Ostrinia nubilalis (Coates et al, 
2005) to 808 bp in H. cunea)). 
3.4 Intergenic spacer sequences 

Because of their rapid evolutionary rates, intergenic 
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spacer sequences (IGS) show remarkable differences 
even among closely related insect species. Except for the 
A+T-rich region, the A. hyperbius mitogenome in this 
study was interleaved with 11 intergenic spacers totaling 
96 bp and ranging in size from 2—52 bp (Fig. 1). The 
longest spacer (52 bp) located between the tRNA°” and 
ND2 genes is a common feature to all lepidopteran 
mitogenomes, but has not yet been detected in non- 
lepidopteran species. This spacer showed a relatively 
high level of homology (62%) with its ND2 gene, which 
is similar to the 70% detected in Parnassius bremeri 
(Kim et al, 2009) but significantly different from the 
32% in Sasakia charonda (unpublished, NC 014224). 
Accordingly, this spacer is thought to have originated 
from a partial duplication of the ND2 gene and 
undergone rapid sequence divergence for their non- 
coding nature among even closely related taxa (Kim et al, 
2009). The other IGS more than 10 bp was present 
between the ND5 and tRNA, and this 15 bp long 
intergenic spacer exists in 15 of the 27 determined 
lepidopteran mitogenomes. Furthermore, a relatively 
conservative element of the nucleotides ATTTT was 
present within this spacer, which has also been found in 
determined insect species in the overwhelming majority 
of conditions. The IGS between tRNA*“(UCN) and ND1 
is common among lepidopteran insects, spanning from 9 
bp in Diatraea saccharalis (unpublished, NC 013274) to 
38 bp in Ostrinia nubilalis (Coates et al, 2005). In the 
present study, however, it wsa nearly absent in A. 
hyperbius with only a 2 bp overlap, which is similar to 
findings on Acraea issoria (Hu et al, 2010), Sasakia 
charonda (unpublished, NC 014224), and Calinaga 
davidis (Xia et al, HQ658143) with 2-, 1-, 1- overlaps 
respectively. The conserved ATACTAA motif is 
regarded as a possible recognition site for the 
transcription termination peptide (mtTERM protein) and 
is usually located in the IGS between the tRNA?"(UCN) 
and ND1 genes. However, this motif was detected within 
the NDI genes of A. hyperbius. This is same as S. 
charonda and C. davidis, but it is present within the 
tRNAP"(UCN) in Eumenis autonoe (Kim et al, 2010) 
and absent in the Sasakia charonda kuriymaensis 
(unpublished, NC 014223). 
3.5 A+T-rich region 

The A+T-rich region harbors the origin sites for 
1999). In 
Drosophila species, this region includes the replication 


transcription and replication (Taanman, 


origin for mtDNA  heavy-strands and minor-strands 
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(Clary & Wolstenholme, 1987). Saito et al (2005) 
precisely determined that the replication origin site for 
mtDNA minor-strand was located in this region in 
Bombyx mori (Yukuhiro et al, 2002). In the present study, 
the A+T-rich region of the A. hyperbius mitochondrial 
genome was located between the srRNA and tRNA V° 
genes (Tab. 2) and was 349 bp in length. This was well 
within the range observed in the completely sequenced 
lepidopteran insects from 317 bp in Melanitis leda 
(unpublished, by our lab) to 747 bp in Bombyx 
mandarina (Liao et al, 2010). The A+T-rich region 
exhibited a remarkably high A+T content (95.4196) and 
did not contain macrorepeat units. However, it included 
some microsatellite-like repeats (e.g. polyT, (AT)o, (TA)s 
and poly-A), as seen in other insect species. For example, 
the polyT stretch (20 bp), which is considered the 
structural signal for recognizing proteins in the mtDNA 
minor-strand initiation (Kim et al, 2009), was located 24 
bp downstream from srRNA preceded by the motif 
ATAGA, which is conserved across the lepidoptera 
orders as well. The microsatellite-like repeat (AT) 
element, located 235 bp downstream from srRNA, was 
preceded by the conserved motif ATTTA, which is 
similar to ATTTA(TA)s in Manduca sexta (Cameron et 
al, 2008), ATTTA(AT)s in Hyphantria cunea (Liao et al, 
2010), ATTTA(AT); in Coreana raphaelis (Kim et al, 
2006), and ATTTA(AT), in Pieris rapae (Mao et al, 
2010). Thus, this phenomenon may be characteristic of 
Additionally, 
microrepeat unit (TA)s and a 5 bp long poly-A stretch 


the insect AT-rich regions. another 
were situated at the 284 bp site downstream from srRNA, 
and immediately upstream tRNA "°, respectively. 
3.6 Sequence variation and codon usage 

The AT-skewness values of the J strand (majority or 
heavy strand) and N strand (minority or light strand) 
were —0.135 and —0.163, respectively, indicating the 
occurrence of more Ts than As in both the J and N 
strands; whereas, the GC skewness about the J and N 


CDspT 


CDspT 


CDspT 


strands were —0.149 and 0.322, respectively, suggesting 
a contrary condition of Gs and Cs. 

For the 13 PCGs, the A+T content at the third codon 
position (92.9%) was higher than the first (74.5%) and 
second position (70.8%). The value of the A+T content 
of PCGs was 79.4% with a strong A+T bias. This result 
has been observed in other insects species, for examples, 
the AT contents of Sasakia charonda, Coreana raphaelis, 
Parnassius bremeri and Helicoverpa armigera PCGs 
have been reported to be 78.2%, 81.5%, 80.1% and 
79.4%, respectively. 

The relative synonymous codon usage (RSCU) 
analysis showed that TTA, ATT, TTT, and ATA were 
the four most frequently used codons, accounting for 
40.4% of all codons in the A. hyperbius mitogenome. 
These four codons were all composed of A or T 
nucleotides, which indicated their biased usage. Such 
results have also been detected in other sequenced 
lepidopteran insects. For example, these four codons 
account for 39.1% in Teinopalpus aureus, 44.1% in 
Coreana raphaelis, and 40.7% in Helicoverpa armigera. 
For amino acids, the Leu, Ile, Phe, and Ser were the most 
frequently used in the A. hyperbius mitogenome PCGs, 
which is in agreement with findings for other 
lepidopteran insects (Fig. 3). The total number of non- 
stop codons (CDs) for the A. hyperbius mitochondrial 
PCGs was 3718, which accords with the range for other 
known butterfly species, from 3695 in Sasakia charonda 
to 3737 in Calinaga davidis. The codons per thousands 
codons(CDspT) of the Ile, Leu2 and Phe were more than 
100, the CDspT of Met, Asn (asparagine), Gly (glycine), 
Ser2 and Tyr (tyrosine) were more than 50, and the Arg 
(arginine), Asp (aspartic acid), Glu (glutamic acid), Gln 
(glutamine), His (histidine) and Leul were below 20, 
with Cys (cysteine) the lowest at 8.61 in A. hyperbius 
mitochondrial PCGs. Both the CDs and CDspT of the A. 
hyperbius 1n this study shared similar patterns with those 
of other Papilionoidea butterfly species (Fig. 3). 
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Fig. 3 Codon distribution in Papilionoidea mtDNAs 
The ND5, NDA, ND4L, ND1 genes converted to the same orientation as the rest of the protein-coding genes. Numbers to the right refer to the total number of 
codons (CDs); the scale to the left refers to codons per thousands codons (CDspT). Codon families are provided on the x axis. The Leul, Leu2, Serl and Ser2 
are defined on the basis of their anti-codon, namely, Leul (CUN), Leu2 (UUR), Serl (AGN) and Ser2 (UCN). 
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